General Notes
There are some things that are generally true across system design questions
Functional Requirements
These requirements describe what the system must do, such as "allow POST request to create a new User" or "be deployed in multiple partitioned regions for GDPR compliance"
These are the sorts of requirements you'd get from a product manager or some form of stakeholder, and then you'd need to design a system with other engineers given current environment
Non-Functional Requirements
Non Functional requirements, I think, are things that are supposed to be derived from Functional Requirements...a NFRequirement of "Scalable" could be inferred from "Needs to be deployed to all users" where we know that there are millions of concurrent users at any time
Scalable
This usually just means it's going to be called a lot, and so we'll need partitions, meaning there are tradeoffs on the Availability-Consistency spectrum we need to choose
- Almost every system always needs to be scalable, i.e. needs partitions, i.e. has main trade-offs between availability and consistency
Durable
Basically we need to write to disk somewhere
Availability-Consistency During Network Partition
When we know we want a scalable, partitioned system we then need to choose what happens in the worst case (i.e. when there are breaks between the partitions)
- When our partitions can't talk to each other do we keep the entire system up (Available) even if some of the nodes can't talk to each other (not Consistent)
- Or do we ensure the system stays in it's last known Consistent state, but can't serve some requests (not Available)
Availability-Consistency In General
Generally we still need to choose between highly performant availability, and linearizable consistency, or something inbetween
- The typical example is read replicas of databases
- If we have N read replicas, and there's 1 leader and N-1 replicas, do we allow those N-1 to serve reads after the 1 may have gotten a new write?
- If we do we're choosing highly performant and highly available over strict consistency
There are ways to handle all of these trade-offs, and the Availability-Consistency spectrum spans a large space that's covered by many database types, isolation levels, and choices - all of them are useful in different scenarios
Typical Patterns
Almost every single system a set of similar patterns, and there are components listed throughout our repo for "typical reusable" components such as frontend infra, queue, KV store, and cache
Front End
- Front end we usually have some sort of DNS resolution to IP address, where the IP address relates to our Load Balancers
- Once it reaches (app) load balancers it will do TLS/SSL termination, decrypt, and forward to either API Gateway, or straight to web apps
- Arch for Typical frontend
Queue
Queue's are reused everywhere, and act as a buffer + messaging channel between multiple services
- Allows services to run at different rates, and scale differently, while ensuring messages are passed
- No longer need to directly call POST to a specific service IP we find in service discovery
- Can POST a message to a queue and it can get picked up by any number of workers
- Ensures messages are durably stored. If either of the services, or even the queue, goes down the messages are typically saved to disk somewhere
- Examples:
- SQS which is the AWS implementation, and the typical "queue" we think of when talking about queue's
- Celery Queue's which are a very odd implementation that tie together producers and consumers
- Redis Queue can hold messages in memory or not, and is configurable, and sometimes is distributed or on a single machine! Lots to look into!